Internal Rewards Mitigate Agent Boundedness

نویسندگان

  • Jonathan Sorg
  • Satinder P. Singh
  • Richard L. Lewis
چکیده

Reinforcement learning (RL) research typically develops algorithms for helping an RL agent best achieve its goals—however they came to be defined—while ignoring the relationship of those goals to the goals of the agent designer. We extend agent design to include the meta-optimization problem of selecting internal agent goals (rewards) which optimize the designer’s goals. Our claim is that well-designed internal rewards can help improve the performance of RL agents which are computationally bounded in some way (as practical agents are). We present a formal framework for understanding both bounded agents and the meta-optimization problem, and we empirically demonstrate several instances of common agent bounds being mitigated by general internal reward functions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning with Internal Reward for Multi-Agent Cooperation: A Theoretical Approach

This paper focuses on a multi-agent cooperation which is generally di cult to be achieved without su cient information of other agents, and proposes the reinforcement learning method that introduces an internal reward for a multi-agent cooperation without su cient information. To guarantee to achieve such a cooperation, this paper theoretically derives the condition of selecting appropriate act...

متن کامل

A Deep Q-Learning Agent for the L-Game with Variable Batch Training

We employ the Deep Q-Learning algorithm with Experience Replay to train an agent capable of achieving a high-level of play in the L-Game while selflearning from low-dimensional states. We also employ variable batch size for training in order to mitigate the loss of the rare reward signal and significantly accelerate training. Despite the large action space due to the number of possible moves, t...

متن کامل

Exploration for Agents with Different Personalities in Unknown Environments

We present in this paper a personality based architecture (PDA) that combines elements from the subsumption architecture and reinforcement learning to find alternate solutions for problems facing artificial agents exploring unknown environments. The underlying PDA algorithm is decomposed into layers according to the different (non-contiguous) stages that our agent passes in, which in turn are i...

متن کامل

Perceptual Reward Functions

Reinforcement learning problems are often described through rewards that indicate if an agent has completed some task. This specification can yield desirable behavior, however many problems are difficult to specify in this manner, as one often needs to know the proper configuration for the agent. When humans are learning to solve tasks, we often learn from visual instructions composed of images...

متن کامل

Adaptive Control for Multiple Cooperative Robot Arms

,2= Abstract In this paper, we address the control problem of multiple robots manipulating a load cooperatively. First we propose a controller that ensures the asymptotic convergence of the load position and the internal forces to their desired values. Next we propose an adaptive control scheme for the multi-robot system. The adaptive controller ensures the asymptotic convergence of the load po...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010